- Cerebral Valley
- Posts
- webAI is putting private, on-device intelligence in every pocket đ
webAI is putting private, on-device intelligence in every pocket đ
Plus: CEO David Stout explains how they're running massive models on hardware you already own...

CV Deep Dive
Today weâre talking with David Stout, co-founder and CEO of webAI.
Founded in late 2019, webAI started with a bold thesis: the most valuable AI systems of the future wouldnât live in centralized cloudsâtheyâd live on devices people already own. Phones. Laptops. Embedded boards. Close to the data. Private. Offline-capable. And talking to one another like a mesh of intelligence.
Davidâs journey began on a farm in northern Michigan, bounced through business school and early AI labs, and eventually led to a fundamental question: Can we run serious AI locally, at scale, and without a cloud connection?
With a few roommates, David ported the YOLO-based Darknet model to an iPhone just to see if it would compile. That prototype kickstarted years of work on AI-native infrastructureâcustom runtimes, device-level quantization, and a communication layer that lets models talk to each other across devicesâand the founding of webAI.
Key Takeaways
Edge as the default: webAI enables large models to run on user-owned hardwareâno hyperscaler required.
Context over scale: They believe superintelligence will emerge from millions of contextual models, not a single giant one.
Entropy-Weighted Quantization (EWQ): Their open-source technique dynamically quantizes models layer-by-layer to optimize for device performance without sacrificing accuracy.
Mission-critical traction: Deployed use cases span aerospace, precision medicine, robotics, and defense.
Tooling for builders: The Navigator and Companion platforms let teams create and deploy private AI without writing infra code.
Letâs dive in âĄď¸
Read time: 8 mins
Our Chat with David đŹ
David - welcome to Cerebral Valley! How did webAI begin?
I grew up on a farm in northern Michigan, then went to a small business school and bounced through a couple of universities before an AI lab finally hooked me in 2016âback when âAIâ mostly meant hand-rolled machine-learning pipelines. I was fascinated by NLP and computer vision, but the idea that kept me up at night was: Could we run serious models on the hardware people already own?
My roommates and I ported Darknetâan early YOLO modelâto an iPhone just to see if it would compile. It did, barely, and that small victory convinced us the future of AI could be personal and live directly on devices users own and control.
We officially incorporated on December 26, 2019, and called it webAIânot about being an internet AI startup, but about creating a true web of models. Millions of specialized AI experts, working together, cooperating, disagreeing, cross-checkingâbut ultimately solving problems together.
Whatâs the one-liner pitch for webAI?
We help enterprises build private, high-performance, high-accuracy AI models that run on hardware they own. If you want a model that works on a plane with no internet, or a healthcare tool that never leaves a secure perimeter, weâre the best partner for that.
Just finished our first round of Metaâs Llama 4 testing in Navigator. The numbers are even better than we expected.
Our webFrame technology on Apple Silicon is delivering:
- Maverick (unquantized): 13 tokens/second
- Maverick (4-bit): 52 tokens/secondThese numbers are ONLY
â David Stout (@davidpstout)
2:26 PM ⢠Apr 9, 2025
Whatâs the long-term vision behind the âweb of modelsâ?
We see the future of intelligence as driven by highly contextual, specialized models working togetherârather than by scaling a single foundational model. It's not about one huge model, it's about coordinating many. That means building an edge-class runtime, a model-to-model communication layer, and an application layer that lets users and builders easily interact with all of it. Thatâs what Navigator and Companion do. The result is fast, private, low-latency AI that works wherever itâs needed.
Which sectors are leaning in first?
The ones where mistakes arenât an optionâdefense, medicine, manufacturing, aerospace. These are use cases where models need to be contextual, accurate, and able to run offline. I'm especially excited about what weâre doing in robotics. Thatâs going to be huge. Itâs a major use case for AI that needs to run locally but also talk to other models when needed. I think itâs going to be a sweet spot for us over the next couple years. Models that live on an assembly robot and only reach out when they need helpâthat's where edge AI shines.
We're proud to be named to the @CBinsights AI 100 - recognizing the most promising artificial intelligence startups of 2025!
This further validates our approach to private, local AI that keeps your data on your device.
Huge thanks to our team, partners, and community who made
â webAI (@thewebAI)
5:34 PM ⢠Apr 28, 2025
Can you share a customer story?
One use case is aviation maintenance. A lot of value is lost when engines are sent off unnecessarily. Weâre working to keep more planes in the air by enabling mechanics to use AI tools locallyâon devices they already carryâto make better decisions on the spot. It saves time, money, and improves safety.
Why choose edge over private cloud?
Itâs not just about privacyâitâs about physics and economics. Realistically, you canât send huge amounts of multimodal dataâlike 32 terabytesâto the cloud, process it remotely, and still achieve real-time responsiveness. Bandwidth limitations, latency issues, and GPU costs quickly become prohibitive.
In practice, itâs often better and more cost-effective to run AI locally, directly where the data lives. For example, when we benchmarked inference performance on typical enterprise workloads, we found big advantages using edge hardware. We compared token-per-dollar efficiency between consumer-grade hardware (like a Mac Studio with 128GB RAM) and data-center GPUs (like an Nvidia H100). The Mac Studio setup delivered significantly higher efficiencyâabout 100 million tokens per dollar, versus around 12 million tokens per dollar on an H100-based system.
This doesnât mean consumer hardware replaces data-center GPUs for every scenario. But for many real-world enterprise use cases, especially those sensitive to cost, latency, or privacyâlike aviation maintenance in an aircraft hangar or diagnostics within secure healthcare environmentsâthe edge setup delivers meaningful efficiency and speed advantages. The ability to deploy locally and economically at scale is the core value of edge AI, and that's why we believe it will ultimately win out over purely centralized approaches.
What technical challenges did you face building webAI's infrastructure?
There weren't many tools for what we wanted to do. CUDA was the only real option, but it doesn't realistically work when you try to bring massive models down to laptops or phones. We didnât raise millions of dollars in 2019, so we couldnât just lease GPUsâwe had to make things work with less. That scarcity-driven mindset led directly to novel innovations and forced us to build a lot ourselves: new fundamentals, runtimes, and AI libraries written directly to hardware like shader cores and BN instruction sets.
One of our biggest contributions was entropy-weighted quantization (EWQ). EWQ profiles the device in real time and dynamically quantizes each layerâsome layers at four bits, some at full precision. EWQ is our only open-source contribution so far, but itâs a micro-example of how resource constraints led to the internal innovations that now define webAI.
Are you working beyond LLMs?
Yesâ90% of deployments are multimodal. Vision and language, sometimes audio. Retrieval-augmented generation is a major focus. Our systems are designed so that a model can take a photo, read from a knowledge base, and reason in contextâall on one device. Thatâs the power of doing it locally.
Why build Companion, if you're an infra company?
Enterprises needed a way to use what they builtâwithout needing SDKs or engineering support. With Navigator, they can build a model. With Companion, they can deploy it to everyone in the org with one click. And in June, weâre rolling out features that let external partners use those models securely, without ever touching the raw weights.
Whatâs the biggest misconception about edge AI?
People think itâs far off. Itâs not. Itâs here today, and in many cases, itâs cheaper and better than cloud. You donât have to sacrifice privacy or performanceâyou can have both. We think of webAI as the factory of intelligence. We donât sell the intelligenceâwe give customers the ability to build their own.
We're proud to announce our partnership with @Divergent3D to advance human-AI collaboration! đ
This partnership combines our distributed AI technology with Divergent's pioneering digital manufacturing capabilities to create:
- Smarter industrial robotics
- More responsiveâ webAI (@thewebAI)
8:02 PM ⢠Apr 2, 2025
Conclusion
Stay up to date on the latest with webAI, learn more here.
Read our past few Deep Dives below:
If you would like us to âDeep Diveâ a founder, team or product launch, please reply to this email ([email protected]) or DM us on Twitter or LinkedIn.